Last.fm

Author : Kartik Jagdale (https://github.com/kartikjagdale)

Last.fm is a music discovery service that gives you personalised recommendations based on the music you listento.

Here we are going to do some machine learning and data anlysis on the dataset of last.fm inorder to recommend the next songs to the user.

We are going to use NearestNeighbors Algorithm to predict next songs that user will like to hear

Note: Dataset retrieved Last.fm [LastFM_Matrix.csv] contaning 1257 records and 285 Songs


In [4]:
# First Import some essential Libraries
import os
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity # For calculating similarity matrix
from sklearn.neighbors import NearestNeighbors

In [5]:
DIR_PATH = os.getcwd() #Get currect directory

lfm = pd.read_csv(DIR_PATH + "//LastFM_Matrix.csv") #Load dataset
lfm.head() #Display Head of the dataset


Out[5]:
user a perfect circle abba ac/dc adam green aerosmith afi air alanis morissette alexisonfire ... timbaland tom waits tool tori amos travis trivium u2 underoath volbeat yann tiersen
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 33 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 42 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 51 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 62 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 286 columns

lets get all/some names of songs and user coloumn in the dataset


In [6]:
songs = pd.DataFrame(lfm.columns)
songs.head(10)


Out[6]:
0
0 user
1 a perfect circle
2 abba
3 ac/dc
4 adam green
5 aerosmith
6 afi
7 air
8 alanis morissette
9 alexisonfire

Now let's import only songs and make a new DataFrame


In [7]:
lfm_songs = lfm.drop("user",axis =1) #drop user column
lfm_songs.head() # Show Head


Out[7]:
a perfect circle abba ac/dc adam green aerosmith afi air alanis morissette alexisonfire alicia keys ... timbaland tom waits tool tori amos travis trivium u2 underoath volbeat yann tiersen
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 285 columns


In [8]:
lfm_songs.shape #gives out total rows and columns


Out[8]:
(1257, 285)

Calculate cosine_similarity in order to get Similarity Matrix


In [9]:
data_similarity = cosine_similarity(lfm_songs.T) #
data_similarity


Out[9]:
array([[ 1.        ,  0.        ,  0.01791723, ...,  0.06506   ,
         0.05216405,  0.        ],
       [ 0.        ,  1.        ,  0.05227877, ...,  0.        ,
         0.02536731,  0.        ],
       [ 0.01791723,  0.05227877,  1.        , ...,  0.02039967,
         0.13084898,  0.        ],
       ..., 
       [ 0.06506   ,  0.        ,  0.02039967, ...,  1.        ,
         0.        ,  0.        ],
       [ 0.05216405,  0.02536731,  0.13084898, ...,  0.        ,
         1.        ,  0.02969569],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.02969569,  1.        ]])

Now we have obtained data similarity matrix now lets use K-nearest neighbour algo and predict the recommendations but first we will label the matrix


In [10]:
type(data_similarity)


Out[10]:
numpy.ndarray

Lets convert it ito DataFrame


In [11]:
data_similarity_df = pd.DataFrame(data_similarity, columns=(lfm_songs.columns), index=(lfm_songs.columns))

In [12]:
data_similarity_df.head()# similarity Matrix


Out[12]:
a perfect circle abba ac/dc adam green aerosmith afi air alanis morissette alexisonfire alicia keys ... timbaland tom waits tool tori amos travis trivium u2 underoath volbeat yann tiersen
a perfect circle 1.000000 0.000000 0.017917 0.051554 0.062776 0.000000 0.051755 0.060718 0 0.000000 ... 0.047338 0.081200 0.394709 0.125553 0.030359 0.111154 0.024398 0.06506 0.052164 0.000000
abba 0.000000 1.000000 0.052279 0.025071 0.061056 0.000000 0.016779 0.029527 0 0.000000 ... 0.000000 0.000000 0.000000 0.061056 0.029527 0.000000 0.094916 0.00000 0.025367 0.000000
ac/dc 0.017917 0.052279 1.000000 0.113154 0.177153 0.067894 0.075730 0.038076 0 0.088333 ... 0.044529 0.067894 0.058241 0.039367 0.000000 0.087131 0.122398 0.02040 0.130849 0.000000
adam green 0.051554 0.025071 0.113154 1.000000 0.056637 0.000000 0.093386 0.000000 0 0.025416 ... 0.000000 0.146516 0.083789 0.056637 0.082169 0.025071 0.022011 0.00000 0.023531 0.088045
aerosmith 0.062776 0.061056 0.177153 0.056637 1.000000 0.000000 0.113715 0.100056 0 0.061898 ... 0.052005 0.029735 0.025507 0.068966 0.033352 0.000000 0.214423 0.00000 0.057307 0.000000

5 rows × 285 columns


In [13]:
data_similarity_df.index.is_unique # check if there is no repeated songs


Out[13]:
True

Now we will use NearestNeighbors Algorithm and apply to similarity matrix to get the recommendation


In [14]:
neigh = NearestNeighbors(n_neighbors=285)
neigh.fit(data_similarity_df) # Fit the data


Out[14]:
NearestNeighbors(algorithm='auto', leaf_size=30, metric='minkowski',
         metric_params=None, n_neighbors=285, p=2, radius=1.0)

In [15]:
#Copy the predicted data to a new DataFrame
model = pd.DataFrame(neigh.kneighbors(data_similarity_df, return_distance=False))
model.head() #gives you integer values instead of song names


Out[15]:
0 1 2 3 4 5 6 7 8 9 ... 275 276 277 278 279 280 281 282 283 284
0 0 277 81 70 189 206 108 235 264 80 ... 216 147 60 90 159 254 261 57 32 218
1 1 221 88 165 174 175 83 208 113 103 ... 230 33 213 172 19 79 162 150 125 241
2 2 128 172 36 190 75 182 116 258 140 ... 218 39 263 248 57 68 179 261 17 32
3 3 255 267 25 276 47 84 104 266 59 ... 213 11 90 20 238 79 92 162 150 125
4 4 281 157 158 115 93 106 78 103 262 ... 253 10 19 162 22 241 39 125 20 150

5 rows × 285 columns


In [16]:
final_model = pd.DataFrame(data_similarity_df.columns[model], index=data_similarity_df.index)#gives names with respect to songs

In [17]:
final_model.head() #preview final Model


Out[17]:
0 1 2 3 4 5 6 7 8 9 ... 275 276 277 278 279 280 281 282 283 284
a perfect circle a perfect circle tool dredg deftones nine inch nails porcupine tree godsmack staind the smashing pumpkins dream theater ... red hot chili peppers katy perry coldplay ensiferum leona lewis the kooks the pussycat dolls christina aguilera beyonce rihanna
abba abba robbie williams elvis presley madonna michael jackson mika duffy queen groove coverage frank sinatra ... slipknot billy talent rammstein metallica arctic monkeys disturbed linkin park killswitch engage in flames system of a down
ac/dc ac/dc iron maiden metallica black sabbath nirvana die toten hosen motorhead hammerfall the offspring judas priest ... rihanna bloc party the shins the decemberists christina aguilera death cab for cutie modest mouse the pussycat dolls arcade fire beyonce
adam green adam green the libertines the strokes babyshambles tom waits bright eyes editors franz ferdinand the streets cocorosie ... rammstein amon amarth ensiferum as i lay dying subway to sally disturbed equilibrium linkin park killswitch engage in flames
aerosmith aerosmith u2 led zeppelin lenny kravitz guns n roses eric clapton genesis dire straits frank sinatra the rolling stones ... the killers all that remains arctic monkeys linkin park atreyu system of a down bloc party in flames as i lay dying killswitch engage

5 rows × 285 columns

The above model gives us all 285 Recommendation, but we want only Top 10 recommendation, so lets modify the DataFrame a bit


In [18]:
top10 = final_model[list(final_model.columns[:11])]

In [19]:
top10.head()


Out[19]:
0 1 2 3 4 5 6 7 8 9 10
a perfect circle a perfect circle tool dredg deftones nine inch nails porcupine tree godsmack staind the smashing pumpkins dream theater opeth
abba abba robbie williams elvis presley madonna michael jackson mika duffy queen groove coverage frank sinatra hans zimmer
ac/dc ac/dc iron maiden metallica black sabbath nirvana die toten hosen motorhead hammerfall the offspring judas priest bloodhound gang
adam green adam green the libertines the strokes babyshambles tom waits bright eyes editors franz ferdinand the streets cocorosie queens of the stone age
aerosmith aerosmith u2 led zeppelin lenny kravitz guns n roses eric clapton genesis dire straits frank sinatra the rolling stones deep purple

Now lets put our results in CSV File called top10


In [20]:
top10.to_csv("top10.csv",index_label = "Index") # store data in csv file

Now lets read the CSV File to check if its saved or not


In [21]:
pd.read_csv("top10").head()


Out[21]:
Index 0 1 2 3 4 5 6 7 8 9 10
0 a perfect circle a perfect circle tool dredg deftones nine inch nails porcupine tree godsmack staind the smashing pumpkins dream theater opeth
1 abba abba robbie williams elvis presley madonna michael jackson mika duffy queen groove coverage frank sinatra hans zimmer
2 ac/dc ac/dc iron maiden metallica black sabbath nirvana die toten hosen motorhead hammerfall the offspring judas priest bloodhound gang
3 adam green adam green the libertines the strokes babyshambles tom waits bright eyes editors franz ferdinand the streets cocorosie queens of the stone age
4 aerosmith aerosmith u2 led zeppelin lenny kravitz guns n roses eric clapton genesis dire straits frank sinatra the rolling stones deep purple

Conclude

To conclude we have created a model which recommends next song user will like to hear by using last.fm data.

Further we can now use this model to make an API and use it in our Website or WebApp to recommend songs to the user.

Github Link : https://github.com/kartikjagdale/Last.fm-Song-Recommender


In [21]: